suppressPackageStartupMessages(suppressWarnings({
    library(dplyr)
    library(tidyr)
    library(ggplot2)
    library(knitr)
}))

knitr::opts_chunk$set(message=FALSE, warning=FALSE, cache=FALSE, fig.width=10, fig.height=7)

Background

The University of Bristol UoB plans to commence the 2020 academic year with a strategy of placing students in living circles, to minimise the spread of SARS-CoV-2 across the student body. To monitor outbreaks it would be costly to simply test everybody on a regular basis, and so whether pooling samples could effectively reduce costs and still adequately detect outbreaks is an open question.

These simulations seek to explore pooling strategies that aim to

Distribution of students per living circle:

load("../data/circles.rdata")
hist(ids$count, xlab="Number of students per circle", breaks=40, main="")

Setup

Our objective is to simulate the RT-PCR test for presence of SARS-Cov-2 in samples from all individuals under different disease transmission scenarios. There are four main components to the model – viral load sampling, disease transmission between students, pooling allocation of collected samples, and testing performance. We compare different aspects of testing performance across three strategies – per-individual testing, pooled testing, or pooled testing with per-individual follow-up in positive pools.

Viral load in infected individuals

We assume the viral load changes over time following a log normal curve. The viral load for each individual \(V_i\left(t\right)\) over time is hence given by

\[ V_i\left(t\right)=\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}\left(\frac{log\left(t\right)-\mu}{\sigma}\right)^2}\times{10}^{l_i} \]

Where t ranges from 0 to 14 days and l_i represents the per-individual scaling value of the viral load. We select the parameters =0 and =0.8, and sample the scaling value as

l_iBeta(0.3,1)3

Finally, to obtain an individual’s viral load at the time of sample collection, we sample from V_i(t) uniformly across the number of days of infection, and denote their sampled viral load as V_{i,0}.

Infections in the population

At time step 0, a random set of individuals is selected to be infected according to some initial prevalence P(0). The number of people each individual goes on to infect is set as

\[ R_i\sim Poisson\left(R\right) \]

We allow the \(R\) value to vary across simulation scenarios.

  • high = within circle = 0.9, within location = 0.09, anywhere else = 0.01
  • medium = within circle = 0.6, within location = 0.3, anywhere else = 0.1
  • low = within circle = 0.34, within location = 0.33, anywhere = 0.33

Assay sensitivity

Based on analysis from Adam Finn and Amy Thomas

  • 1x dilution = 0.998 sensitivity
  • 10x dilution = 0.8 sensitivity

The RT-PCR testing process involves an exponential growth phase of viral RNA, such that after a number of replication cycles n, the viral load

\[ V_{i,n}=\frac{\sum_{D} V_{i,0}}{D}\left(1+E\right)^n \]

where \(E\) is the efficiency of the reaction, \(\sum_{D} V_{i,0}\) is the starting viral load in the sample, and \(D\) is the number samples in the testing pool. In the case of the individual test, \(D=1\) and the \(V_{i,0}\) is the viral load of the individual. We assume that the efficiency lies between 0.65 and 0.9, such as

\[ E\sim Beta\left(\alpha,\beta\right)\times\left(0.9-0.65\right)+0.65 \]

In order to test positive, the RT-PCR test must detect a viral load of some fluorescence threshold corresponding to a some number of RNA copies \(V_{C_T}\) after some number of cycles \(n=C_t\). We can write down the \(V_{C_t}\) value given \(C_t\), distribution of \(V_{i,0}\) values, dilution factor \(D\), parameter values for the distribution of efficiencies \(E\), and the expected test sensitivity as a quantile function

\[ V_{C_t}=Q\left(D,R_0,C_t,\alpha,\beta,{1-T}_{se}\right) \]

A distribution of \(V_{i,0}\) values can be obtained from the viral load model. Through experimental calibration we expect that the average sensitivity of the test is \(T_{se}=0.98\) when \(D=1\), falling to 0.8 when \(D=10\) using \(C_t=35\) cycles. Hence, we must identify a set of \(\alpha\) and \(\beta\) parameters that determine the test efficiency distribution that satisfies the calibrated test sensitivities given the assumed distribution of viral loads in the samples. To achieve this we use a general optimisation function, where we want to find an \(V_{C_t}\) threshold that is estimated to be identical for the two dilution scenarios:

\[ \underset{\alpha, \beta}{\arg \min} \left ( Q \left(D=1,R_0,C_t=35,\alpha,\beta,T_{se}=0.02\right)-Q\left(D=10,R_0,C_t=35,\alpha, \beta,T_{se}=0.2\right) \right)^2 \]

We used the optim function in R 4.0.2 to solve this optimisation function, obtaining parameter values of \(\alpha=\ 2.60\) and \(\beta=1.33\).

See the ct.html document (ct.rmd) on how this is done.

Previous to this

Was previously not allowing any heterogeneity between Ct values using the following - now deprecated

Assume 98% of \(TPR=0.98\) and specificity of \(TNR = 1\) for a single sample (not pooled / diluted). By pooling samples, the dilution is assumed to be the proportion positive divided by the proportion in the pool. In order to simulate this appropriately we need a function that will relate sensitivity to dilution fraction.

Use a sigmoid function that allows

\[ TPR_{d} = -\frac{1}{1 + e^{-d/4 + 4}} + 1 \]

where \(d\) is the dilution factor, calculated as the total number of people in the pool divided by the number of infected people in the pool. Ideally would get some empirical results on this (e.g. see this paper that suggests 8x dilution doesn't affect the assay https://linkinghub.elsevier.com/retrieve/pii/S1198743X20303499).

Pooling

  • Random pooling
  • Pooling by living circle

In the latter we use a bin packing algorithm to minimise distribution of living circles across test pools

To maximise the efficacy of pooling, assume that more transmission will occur within circles, and so pooling based on circles should reduce the dilution factor. For the simulations, determine a set assay pool size and allocate individuals to pools as follows.

  1. If the circle size is the same as the pool size, then keep as a single pool
  2. If the circle size is smaller than the pool size, use a bin packing algorithm to minimise the number of pools across all the smaller circles
  3. If a circle size is larger than the pool size, break it up to have some number of complete pools and the remainder goes into (2).

Once the pools are determined, in order to report on whether circles are infected need to reassemble the circles from the pools and declare if a particular circle has any infected based on whether any of its constituent pools have been found positive.

The simulations also follow up positive pools by individually testing everyone within that pool.

Outbreak

Define outbreak as 2 people in a location being infected. This is detected either from

  • individual tests returning 2 people in a location
  • one location pool turning up positive
  • pool + followup tests returning 2 people in a location

Note: this needs to be improved in the model

Results

load("../results/sim.rdata")

Use the following costs:

res$nstudent <- res$reagentuse.ind_tests

Cost difference in pooling vs pooling with followup

assay_ppv <- function(sensitivity, prevalence, specificity)
{
    sensitivity * prevalence / (sensitivity * prevalence + (1-specificity) * (1-prevalence))
}

ress <- res %>%
    group_by(prevalence, spread, containment, pool_size, random_pooling) %>%
    summarise(
        cost.ind_tests = mean(cost.ind_tests, na.rm=TRUE),
        cost.pool_tests = mean(cost.pool_tests, na.rm=TRUE),
        cost.poolfollowup_tests = mean(cost.poolfollowup_tests, na.rm=TRUE),
        reagentuse.ind_tests = mean(reagentuse.ind_tests, na.rm=TRUE),
        reagentuse.pool_tests = mean(reagentuse.pool_tests, na.rm=TRUE),
        reagentuse.poolfollowup_tests = mean(reagentuse.poolfollowup_tests, na.rm=TRUE),
        sensitivity.ind_tests = mean(sensitivity.ind_tests, na.rm=TRUE),
        sensitivity.pool_tests = mean(sensitivity.pool_tests, na.rm=TRUE),
        sensitivity.poolfollowup_tests = mean(sensitivity.poolfollowup_tests, na.rm=TRUE),
        specificity.ind_tests = mean(specificity.ind_tests, na.rm=TRUE),
        specificity.pool_tests = mean(specificity.pool_tests, na.rm=TRUE),
        specificity.poolfollowup_tests = mean(specificity.poolfollowup_tests, na.rm=TRUE),
        sensitivity.outbreak.ind_tests = mean(sensitivity.outbreak.ind_tests, na.rm=TRUE),
        sensitivity.outbreak.pool_outbreak_detected = mean(sensitivity.outbreak.pool_outbreak_detected[is.finite(sensitivity.outbreak.pool_outbreak_detected)], na.rm=TRUE),
        sensitivity.outbreak.poolfollowup_outbreak_detected = mean(sensitivity.outbreak.poolfollowup_outbreak_detected[is.finite(sensitivity.outbreak.poolfollowup_outbreak_detected)], na.rm=TRUE),
        specificity.outbreak.ind_tests = mean(specificity.outbreak.ind_tests[is.finite(specificity.outbreak.ind_tests)], na.rm=TRUE),
        specificity.outbreak.pool_outbreak_detected = mean(specificity.outbreak.pool_outbreak_detected[is.finite(specificity.outbreak.pool_outbreak_detected)], na.rm=TRUE),
        specificity.outbreak.poolfollowup_outbreak_detected = mean(specificity.outbreak.poolfollowup_outbreak_detected[is.finite(specificity.outbreak.poolfollowup_outbreak_detected)], na.rm=TRUE),
        true_prevalence=mean(true_prevalence),
        prevalence.ind_tests = mean(prevalence.ind_tests, na.rm=TRUE),
        prevalence.pool_tests = mean(prevalence.pool_tests, na.rm=TRUE),
        prevalence.poolfollowup_tests = mean(prevalence.poolfollowup_tests, na.rm=TRUE),
        efficiency.ind_tests = mean(sensitivity.ind_tests / cost.ind_tests),
        efficiency.pool_tests = mean(sensitivity.pool_tests / cost.pool_tests),
        efficiency.poolfollowup_tests = mean(sensitivity.poolfollowup_tests / cost.poolfollowup_tests),
        ppv.ind_tests = assay_ppv(sensitivity.ind_tests, true_prevalence, specificity.ind_tests),
        ppv.pool_tests = assay_ppv(sensitivity.pool_tests, true_prevalence, specificity.pool_tests),
        ppv.poolfollowup_tests = assay_ppv(sensitivity.poolfollowup_tests, true_prevalence, specificity.poolfollowup_tests),
        efficiency2.ind_tests = mean(ppv.ind_tests / cost.ind_tests),
        efficiency2.pool_tests = mean(ppv.pool_tests / cost.pool_tests),
        efficiency2.poolfollowup_tests = mean(ppv.poolfollowup_tests / cost.poolfollowup_tests),
        sensitivity.ind_lfd = 0.76,
        sensitivity.ind_lfd2 = 1-(1-0.76)^2,
        specificity.ind_lfd = 0.9965,
        specificity.ind_lfd2 = 0.9965^2,
        reagentuse.ind_lfd = 0,
        reagentuse.ind_lfd2 = 0,
        ppv.ind_lfd = assay_ppv(sensitivity.ind_lfd, true_prevalence, specificity.ind_lfd),
        ppv.ind_lfd2 = assay_ppv(sensitivity.ind_lfd2, true_prevalence, specificity.ind_lfd2),
        cost.ind_lfd = reagentuse.ind_tests * 5,
        cost.ind_lfd2 = reagentuse.ind_tests * 10,
        prevalence.ind_lfd = sensitivity.ind_lfd * true_prevalence + (1-specificity.ind_lfd),
        prevalence.ind_lfd2 = sensitivity.ind_lfd2 * true_prevalence + (1-specificity.ind_lfd2),
        efficiency.ind_lfd = mean(sensitivity.ind_lfd / cost.ind_lfd),
        efficiency.ind_lfd2 = mean(sensitivity.ind_lfd2 / cost.ind_lfd2),
        efficiency2.ind_lfd = mean(ppv.ind_lfd / cost.ind_lfd),
        efficiency2.ind_lfd2 = mean(ppv.ind_lfd2 / cost.ind_lfd2)
    )
ress$group <- paste(ress$prevalence, ress$spread, ress$containment, ress$random_pooling)

temp <- group_by(ress, prevalence, spread, containment) %>%
    summarise(
        pool_size=1, 
        random_pooling=FALSE,
        cost.ind_tests = mean(cost.ind_tests, na.rm=TRUE),
        cost.ind_lfd = mean(cost.ind_lfd, na.rm=TRUE),
        cost.ind_lfd2 = mean(cost.ind_lfd2, na.rm=TRUE),
        reagentuse.ind_tests = mean(reagentuse.ind_tests, na.rm=TRUE),
        reagentuse.ind_lfd = mean(reagentuse.ind_lfd, na.rm=TRUE),
        reagentuse.ind_lfd2 = mean(reagentuse.ind_lfd2, na.rm=TRUE),
        sensitivity.ind_tests = mean(sensitivity.ind_tests, na.rm=TRUE),
        sensitivity.ind_lfd = mean(sensitivity.ind_lfd, na.rm=TRUE),
        sensitivity.ind_lfd2 = mean(sensitivity.ind_lfd2, na.rm=TRUE),
        specificity.ind_tests = mean(specificity.ind_tests, na.rm=TRUE),
        specificity.ind_lfd = mean(specificity.ind_lfd, na.rm=TRUE),
        specificity.ind_lfd2 = mean(specificity.ind_lfd2, na.rm=TRUE),
        prevalence.ind_tests = mean(prevalence.ind_tests, na.rm=TRUE),
        prevalence.ind_lfd = mean(prevalence.ind_lfd, na.rm=TRUE),
        prevalence.ind_lfd2 = mean(prevalence.ind_lfd2, na.rm=TRUE),
        efficiency.ind_tests = mean(efficiency.ind_tests, na.rm=TRUE),
        efficiency.ind_lfd = mean(efficiency.ind_lfd, na.rm=TRUE),
        efficiency.ind_lfd2 = mean(efficiency.ind_lfd2, na.rm=TRUE),
        efficiency2.ind_tests = mean(efficiency2.ind_tests, na.rm=TRUE),
        efficiency2.ind_lfd = mean(efficiency2.ind_lfd, na.rm=TRUE),
        efficiency2.ind_lfd2 = mean(efficiency2.ind_lfd2, na.rm=TRUE),
        ppv.ind_tests = mean(ppv.ind_tests, na.rm=TRUE),
        ppv.ind_lfd = mean(ppv.ind_lfd, na.rm=TRUE),
        ppv.ind_lfd2 = mean(ppv.ind_lfd2, na.rm=TRUE),
        true_prevalence = mean(true_prevalence, na.rm=TRUE)
    )
ress2 <- rbind(ress, temp)
ress2$pooling_type <- "Clustered"
ress2$pooling_type[ress2$random_pooling] <- "Random"

rename_labels <- function(x)
{
    strsplit(x, split="\\.") %>%
    lapply(., function(y)
    {
        y <- y[2]
        y[y=="ind_tests"] <- "Individual PCR"
        y[y=="ind_lfd"] <- "Individual LFD"
        y[y=="ind_lfd2"] <- "Individual LFD x2"
        y[y=="pool_tests"] <- "Pooled PCR"
        y[y=="poolfollowup_tests"] <- "Pooled PCR + followup"
        y
    }) %>% unlist
}

What is the extra cost incurred by following up on positive pools to get infected individuals (R0 (cols) vs baseline prevalence (rows))?

tt <- ress2 %>% subset(., containment == "conquest") %>%
dplyr::select(cost.ind_tests, cost.pool_tests, cost.poolfollowup_tests, cost.ind_lfd, cost.ind_lfd2, group, containment, pool_size, prevalence, spread, pooling_type) %>%
tidyr::gather(key="key", value="cost", cost.pool_tests, cost.poolfollowup_tests, cost.ind_tests, cost.ind_lfd, cost.ind_lfd2) %>%
subset(., grepl("pool", key) | grepl("ind", key) & pool_size == 1) %>%
mutate(
    key=paste0(rename_labels(key), " (", pooling_type, ")"), 
    spread=paste("R =", spread), prevalence=paste("prv =", prevalence)) %>%
arrange(pool_size)
tt$key[grepl("Individual", tt$key)] <- gsub(" \\(Clustered)", "", tt$key[grepl("Individual", tt$key)])

ggplot(subset(tt, pool_size != 1), aes(x=pool_size, y=cost)) +
geom_point(aes(colour=key)) +
geom_line(aes(colour=key)) +
geom_hline(data=subset(tt, grepl("Indiv", key)), aes(yintercept=cost, linetype=key)) +
facet_grid(prevalence ~ spread) +
scale_fill_brewer(type="qual", palette=2) +
theme_bw() +
scale_x_continuous(breaks=subset(tt, pool_size != 1) %>% {unique(.$pool_size)}) +
labs(x="Pool size", linetype="", colour="")

Another visualisation of the same thing, showing that impact of clustering isn't very high

ress %>% mutate(spread=paste("R =", spread), prevalence=paste("prv =", prevalence)) %>%
ggplot(., aes(x=cost.pool_tests, y=cost.poolfollowup_tests, group=group)) +
geom_point(aes(colour=containment, size=pool_size, shape=random_pooling)) +
geom_line(aes(colour=containment, linetype=random_pooling)) +
geom_abline(slope=1) +
facet_grid(prevalence ~ spread) +
scale_colour_brewer() +
theme_bw() +
theme(axis.text.x=element_text(angle=90, hjust=1, vjust=0.5)) +
labs(x="Cost of pooled PCR", y="Cost of pooled PCR with followup") 

Followup tests are relatively more expensive when the spread is higher and the prevalence is higher. Similar for reagent use:

tt <- ress2 %>% subset(., containment == "conquest") %>%
dplyr::select(reagentuse.ind_tests, reagentuse.pool_tests, reagentuse.poolfollowup_tests, reagentuse.ind_lfd, reagentuse.ind_tests, group, containment, pool_size, prevalence, spread, pooling_type) %>%
tidyr::gather(key="key", value="reagentuse", reagentuse.pool_tests, reagentuse.poolfollowup_tests, reagentuse.ind_tests, reagentuse.ind_lfd) %>%
subset(., grepl("pool", key) | grepl("ind", key) & pool_size == 1) %>%
mutate(
    key=paste0(rename_labels(key), " (", pooling_type, ")"), 
    spread=paste("R =", spread), prevalence=paste("prv =", prevalence)) %>%
arrange(pool_size)
tt$key[tt$key == "Individual PCR (Clustered)"] <- "Individual PCR"
tt$key[tt$key == "Individual LFD (Clustered)"] <- "Individual LFD"
tt2 <- subset(tt, pool_size == 2)
tt2$pool_size <- 1
tt2$reagentuse <- subset(tt, key == "Individual PCR")$reagentuse[1]
tt <- rbind(subset(tt, pool_size != 1), tt2)

ggplot(tt, aes(x=pool_size, y=reagentuse)) +
geom_point(aes(colour=key)) +
geom_line(aes(colour=key)) +
facet_grid(prevalence ~ spread) +
scale_fill_brewer(type="qual", palette=2) +
theme_bw() +
scale_x_continuous(breaks=unique(tt$pool_size)) +
labs(x="Pool size", linetype="", colour="", y="Number of RT-qPCR tests")

Example

x <- filter(tt, containment == "conquest", pool_size==20, prevalence == "prv = 0.05", spread == "R = 3")
x$reagentuse[3]/x$reagentuse[4]
## [1] 0.8545004

Sensitivity of pool vs pool+followup is basically the same, but both drop sharply with larger pool size given current dilution curves (R0 (cols) vs baseline prevalence (rows)):

tt <- ress2 %>% subset(., containment == "conquest") %>%
dplyr::select(sensitivity.ind_tests, sensitivity.pool_tests, sensitivity.poolfollowup_tests, sensitivity.ind_lfd, sensitivity.ind_lfd2, group, containment, pool_size, prevalence, spread, pooling_type) %>%
tidyr::gather(key="key", value="sensitivity", sensitivity.pool_tests, sensitivity.poolfollowup_tests, sensitivity.ind_tests, sensitivity.ind_lfd, sensitivity.ind_lfd2) %>%
subset(., grepl("pool", key) | grepl("ind", key) & pool_size == 1) %>%
mutate(
    key=paste0(rename_labels(key), " (", pooling_type, ")"), 
    spread=paste("R =", spread), prevalence=paste("prv =", prevalence)) %>%
arrange(pool_size)
tt$key[grepl("Individual", tt$key)] <- gsub(" \\(Clustered)", "", tt$key[grepl("Individual", tt$key)])

ggplot(subset(tt, pool_size != 1), aes(x=pool_size, y=sensitivity)) +
geom_point(aes(colour=key)) +
geom_line(aes(colour=key)) +
geom_hline(data=subset(tt, grepl("Indiv", key)), aes(yintercept=sensitivity, linetype=key)) +
facet_grid(prevalence ~ spread) +
scale_fill_brewer(type="qual", palette=2) +
theme_bw() +
scale_x_continuous(breaks=subset(tt, pool_size != 1) %>% {unique(.$pool_size)}) +
labs(x="Pool size", linetype="", colour="")

Specificity

tt <- ress2 %>% subset(., containment == "conquest") %>%
dplyr::select(specificity.ind_tests, specificity.pool_tests, specificity.poolfollowup_tests, specificity.ind_lfd, specificity.ind_tests, specificity.ind_lfd2, group, containment, pool_size, prevalence, spread, pooling_type) %>%
tidyr::gather(key="key", value="specificity", specificity.pool_tests, specificity.poolfollowup_tests, specificity.ind_tests, specificity.ind_lfd, specificity.ind_lfd2) %>%
subset(., grepl("pool", key) | grepl("ind", key) & pool_size == 1) %>%
mutate(
    key=paste0(rename_labels(key), " (", pooling_type, ")"), 
    spread=paste("R =", spread), prevalence=paste("prv =", prevalence)) %>%
arrange(pool_size)
tt$key[grepl("Individual", tt$key)] <- gsub(" \\(Clustered)", "", tt$key[grepl("Individual", tt$key)])

ggplot(subset(tt, pool_size != 1), aes(x=pool_size, y=1-specificity)) +
geom_point(aes(colour=key)) +
geom_line(aes(colour=key)) +
geom_hline(data=subset(tt, grepl("Indiv", key)), aes(yintercept=1-specificity, linetype=key)) +
facet_grid(prevalence ~ spread) +
scale_fill_brewer(type="qual", palette=2) +
# theme_bw() +
scale_x_continuous(breaks=subset(tt, pool_size != 1) %>% {unique(.$pool_size)}) +
scale_y_log10() +
labs(x="Pool size", linetype="", colour="", y="False positive rate (log10 scale)")

How many unnecessary quarantines if we quarantine whole circles when a positive pool comes back (R0 (cols) vs baseline prevalence (rows)):

ggplot(ress, aes(y=specificity.pool_tests, x=pool_size, group=group)) +
geom_point(aes(colour=containment, shape=random_pooling)) +
geom_line(aes(colour=containment, linetype=random_pooling)) +
facet_grid(prevalence ~ spread) +
scale_colour_brewer() +
labs(x="Pool size", y="Specificity of pool tests", colour="R", shape="Random pooling", linetype="Random pooling")

ggplot(ress, aes(y=specificity.poolfollowup_tests, x=pool_size, group=group)) +
geom_point(aes(colour=containment, shape=random_pooling)) +
geom_line(aes(colour=containment, linetype=random_pooling)) +
facet_grid(prevalence ~ spread) +
scale_colour_brewer() +
labs(x="Pool size", y="Specificity of pool+followup tests", colour="R", shape="Random pooling", linetype="Random pooling")

ggplot(ress, aes(y=sensitivity.pool_tests, x=pool_size, group=group)) +
geom_point(aes(colour=containment, shape=random_pooling)) +
geom_line(aes(colour=containment, linetype=random_pooling)) +
facet_grid(prevalence ~ spread) +
scale_colour_brewer() +
labs(x="Pool size", y="Sensitivity of pool tests", colour="R", shape="Random pooling", linetype="Random pooling")

ggplot(ress, aes(y=sensitivity.poolfollowup_tests, x=pool_size, group=group)) +
geom_point(aes(colour=containment, shape=random_pooling)) +
geom_line(aes(colour=containment, linetype=random_pooling)) +
facet_grid(prevalence ~ spread) +
scale_colour_brewer() +
labs(x="Pool size", y="Sensitivity of pool+followup", colour="R", shape="Random pooling", linetype="Random pooling")

Positive predictive value

tt <- ress2 %>% subset(., containment == "conquest") %>%
dplyr::select(ppv.ind_tests, ppv.pool_tests, ppv.poolfollowup_tests, ppv.ind_lfd, ppv.ind_lfd2, group, containment, pool_size, prevalence, spread, pooling_type) %>%
tidyr::gather(key="key", value="ppv", ppv.pool_tests, ppv.poolfollowup_tests, ppv.ind_tests, ppv.ind_lfd, ppv.ind_lfd2) %>%
subset(., grepl("pool", key) | grepl("ind", key) & pool_size == 1) %>%
mutate(
    key=paste0(rename_labels(key), " (", pooling_type, ")"), 
    spread=paste("R =", spread), prevalence=paste("prv =", prevalence)) %>%
arrange(pool_size)
tt$key[grepl("Individual", tt$key)] <- gsub(" \\(Clustered)", "", tt$key[grepl("Individual", tt$key)])

ggplot(subset(tt, pool_size != 1), aes(x=pool_size, y=ppv)) +
geom_point(aes(colour=key)) +
geom_line(aes(colour=key)) +
geom_hline(data=subset(tt, grepl("Indiv", key)), aes(yintercept=ppv, linetype=key)) +
facet_grid(prevalence ~ spread) +
scale_fill_brewer(type="qual", palette=2) +
theme_bw() +
scale_x_continuous(breaks=subset(tt, pool_size != 1) %>% {unique(.$pool_size)}) +
labs(x="Pool size", linetype="", colour="", y="Positive predictive value")

Estimating prevalence

tt <- ress2 %>% subset(., containment == "conquest") %>%
dplyr::select(prevalence.ind_tests, prevalence.ind_lfd, prevalence.ind_lfd2, prevalence.pool_tests, prevalence.poolfollowup_tests, group, containment, pool_size, pooling_type, prevalence, spread, true_prevalence) %>%
tidyr::gather(key="key", value="est_prevalence", prevalence.pool_tests, prevalence.poolfollowup_tests, prevalence.ind_tests, prevalence.ind_lfd, prevalence.ind_lfd2) %>%
subset(., grepl("pool", key) | grepl("ind", key) & pool_size == 1) %>%
subset(., pool_size %in% c(1,2,5,10,20,30)) %>%
mutate(key=paste0(rename_labels(key), " (", pooling_type, ")"))
tt$key[grepl("Individual", tt$key)] <- gsub(" \\(Clustered)", "", tt$key[grepl("Individual", tt$key)])

ggplot(tt, aes(x=true_prevalence, y=est_prevalence)) +
geom_line(aes(colour=as.factor(pool_size), linetype=as.factor(prevalence))) +
geom_point(aes(colour=as.factor(pool_size), shape=as.factor(prevalence))) +
facet_wrap(~ key, ncol=4, scale="free") +
geom_abline(slope=1) +
scale_colour_brewer(type="qual") +
theme_bw() +
labs(x="True prevalence", y="Estimated prevalence", colour="Pool size", linetype="Initial prevalence", shape="Initial prevalence")

Efficiency - sensitivity vs cost ratios

tt <- ress2 %>% subset(., containment == "conquest") %>%
dplyr::select(efficiency.ind_tests, efficiency.pool_tests, efficiency.poolfollowup_tests, efficiency.ind_lfd, efficiency.ind_lfd2, group, containment, pool_size, prevalence, spread, pooling_type) %>%
tidyr::gather(key="key", value="efficiency", efficiency.pool_tests, efficiency.poolfollowup_tests, efficiency.ind_tests, efficiency.ind_lfd, efficiency.ind_lfd2) %>%
subset(., grepl("pool", key) | grepl("ind", key) & pool_size == 1) %>%
mutate(
    key=paste0(rename_labels(key), " (", pooling_type, ")"), 
    spread=paste("R =", spread), prevalence=paste("prv =", prevalence)) %>%
arrange(pool_size)
tt$key[grepl("Individual", tt$key)] <- gsub(" \\(Clustered)", "", tt$key[grepl("Individual", tt$key)])

ggplot(subset(tt, pool_size != 1), aes(x=pool_size, y=efficiency)) +
geom_point(aes(colour=key)) +
geom_line(aes(colour=key)) +
geom_hline(data=subset(tt, grepl("Indiv", key)), aes(yintercept=efficiency, linetype=key)) +
facet_grid(prevalence ~ spread) +
scale_fill_brewer(type="qual", palette=2) +
theme_bw() +
scale_x_continuous(breaks=subset(tt, pool_size != 1) %>% {unique(.$pool_size)}) +
labs(x="Pool size", linetype="", colour="", y="Sensitivity / cost")

tt <- ress2 %>% subset(., containment == "conquest") %>%
dplyr::select(efficiency2.ind_tests, efficiency2.pool_tests, efficiency2.poolfollowup_tests, efficiency2.ind_lfd, efficiency2.ind_lfd2, group, containment, pool_size, prevalence, spread, pooling_type) %>%
tidyr::gather(key="key", value="efficiency2", efficiency2.pool_tests, efficiency2.poolfollowup_tests, efficiency2.ind_tests, efficiency2.ind_lfd, efficiency2.ind_lfd2) %>%
subset(., grepl("pool", key) | grepl("ind", key) & pool_size == 1) %>%
mutate(
    key=paste0(rename_labels(key), " (", pooling_type, ")"), 
    spread=paste("R =", spread), prevalence=paste("prv =", prevalence)) %>%
arrange(pool_size)
tt$key[grepl("Individual", tt$key)] <- gsub(" \\(Clustered)", "", tt$key[grepl("Individual", tt$key)])

ggplot(subset(tt, pool_size != 1), aes(x=pool_size, y=efficiency2)) +
geom_point(aes(colour=key)) +
geom_line(aes(colour=key)) +
geom_hline(data=subset(tt, grepl("Indiv", key)), aes(yintercept=efficiency2, linetype=key)) +
facet_grid(prevalence ~ spread) +
scale_fill_brewer(type="qual", palette=2) +
theme_bw() +
scale_x_continuous(breaks=subset(tt, pool_size != 1) %>% {unique(.$pool_size)}) +
labs(x="Pool size", linetype="", colour="", y="Positive predictive value / cost")

Need to know

Considerations

Other thoughts

If a pool is detected positive, is it possible to avoid having to test everyone in the pool to identify the infected individuals?

Group testing theory - when prevalence is low then pooling can be more efficient, basically just look for positive results in a pool and then follow up everyone in the pool. To maximise the efficiency of the pooling need to have a sense of the prevalence.

This paper uses tags along with pooling: https://advances.sciencemag.org/content/6/37/eabc5961

This is an R package for group testing: https://journal.r-project.org/archive/2010/RJ-2010-016/RJ-2010-016.pdf

https://linkinghub.elsevier.com/retrieve/pii/S1198743X20303499

https://theconversation.com/group-testing-for-coronavirus-called-pooled-testing-could-be-the-fastest-and-cheapest-way-to-increase-screening-nationwide-141579

Actually the extra costs probably don't matter too much esp when prevalence low